Numerical Linear

نویسندگان

  • Michael W. Berry
  • Ricardo D. Fierro
چکیده

Current methods to index and retrieve documents from databases usually depend on a lexical match between query terms and keywords extracted from documents in a database. These methods can produce incomplete or irrelevant results due to the use of synonyms and polysemus words. The association of terms with documents (or implicit semantic structure) can be derived using large sparse term-by-document matrices. In fact, both terms and documents can be matched with user queries using representations in k-space (where 100 k 200) derived from k of the largest approximate singular vectors of these term-by-document matrices. This completely automated approach called Latent Semantic Indexing or LSI, uses subspaces spanned by the approximate singular vectors to encode important asso-ciative relationships between terms and documents in k-space. Using LSI, two or more documents may be close to each other in k-space (and hence meaning) yet share no common terms. The focus of this work is to demonstrate the computational advantages of exploiting low-rank orthogonal decompositions such as the ULV (or URV) as opposed to the truncated singular value decomposition (SVD) for the construction of initial and updated rank-k subspaces arising from LSI applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996